状態予測誤差(state prediction error)
↔︎予測報酬誤差(Reward-Prediction Error; RPE)